Gaussian Process Vine Copulas for Multivariate Dependence 



David Lopez-Paz 

Max Planck Institute for Intelligent Systems 

Jose Miguel Hernandez-Lobato 
Zoubin Ghahramani 

University of Cambridge 



DAVID. LOPEZOtUE.MPG.DE 



jmh233@eng.cam.ac.uk 
zoubin@eng.cam.ac.uk 




Abstract 

Copulas allow to learn marginal distributions 
separately from the multivariate dependence 
structure (copula) that links them together 
into a density function. Vine factorizations 
ease the learning of high-dimensional copu- 
las by constructing a hierarchy of conditional 
bivariate copulas. However, to simplify in- 
ference, it is common to assume that each 
of these conditional bivariate copulas is inde- 
pendent from its conditioning variables. In 
this paper, we relax this assumption by dis- 
covering the latent functions that specify the 
shape of a conditional copula given its con- 
ditioning variables. We learn these functions 
by following a Bayesian approach based on 
sparse Gaussian processes with expectation 
propagation for scalable, approximate infer- 
ence. Experiments on real-world datasets 
show that, when modeling all conditional de- 
pendencies, we obtain better estimates of the 
underlying copula of the data. 

1. Introduction 

Copulas are becoming a popular approach in machine 
learning to describe multivariate data (Elidan, 2012; 
Kirshner, 2007; Elidan, 2010; Wilson & Ghahramani, 
2010). Estimating multivariate densities is difficult 
due to possibly complicated forms of the data dis- 
tribution and the curse of dimensionality. Copulas 
simplify this process by separating the learning of the 
marginal distributions from the learning of the mul- 
tivariate dependence structure, or copula, that links 
them together into a density model (Joe, 2005). Learn- 
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Figure 1. Two bidimensional densities (left, middle) that 
share the same underlying Gaussian copula, with correla- 
tion parameter 9 = 0.8 (right). The two distributions differ 
because of their marginal distributions, depicted at the top 
of each density plot. 



ing the marginals is easy and can be done using stan- 
dard univariate methods. However, learning the cop- 
ula is more difficult and requires models that can rep- 
resent a broad range of dependence patterns. For the 
two-dimensional case, there exists a large collection of 
parametric copula models (Nelsen, 2006). However, in 
higher dimensions, the number and expressiveness of 
families of parametric copulas is more limited. A solu- 
tion to this problem is given by pair copula construc- 
tions, vine copulas or simply vines (Bedford & Cooke, 
2002; Kurowicka & Cooke, 2006). These are graphical 
models that decompose any multivariate copula into 
a hierarchy of bivariate copulas, where some of them 
will be conditioned on a subset of the data variables. 
The deeper a bivariate copula is in the vine hierarchy, 
the more variables it will be conditioned on. If the 
conditional dependencies described above are ignored, 
vines are a straightforward approach to construct flex- 
ible high-dimensional dependence models using stan- 
dard parametric bivariate copulas as building blocks. 

The impact of ignoring conditional dependencies in 
the copula functions is likely to be problem specific. 
Hobaek et al. (2010) show thorough experiments with 
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synthetic data that, in specific cases, ignoring condi- 
tional dependencies can lead to reasonably accurate 
approximations of the true copula. By contrast, Acar, 
Genest and Neslehova (2012) indicate that this sim- 
plifying assumption can be in other cases misleading, 
and develop a method to condition parametric bivari- 
ate copulas on a single scalar variable. In this paper, 
we extend the work of Acar et al. (2012) and propose 
a general technique to construct arbitrary vine mod- 
els with full conditional parametric bivariate copulas. 
Our results on several real-world datasets show that 
it is often important to take into account conditional 
dependencies when constructing a vine model. 

The proposed method is based on the fact that most 
parametric bivariate copulas can be specified in terms 
of Kendall's rank correlation parameter r £ [—1,1] 
(Joe, 1997). The dependence of the copula on a vec- 
tor of conditioning variables u = {ui, . . . , Udj^ is then 
captured by specifying the relationship t = (t(/(u)), 
where / : M'' — > M is a non-linear function and 
cr : M [—1,1] is a scaling operation. We follow a 
Bayesian approach to learn / from available data. In 
particular, we place a Gaussian process (GP) prior on 
/ and use expectation propagation for approximate in- 
ference (Rasmussen & Williams, 2006; Minka, 2001). 
To make our method scalable, we use sparse GPs based 
on the generalized FITC approximation (Snelson & 
Ghahramani, 2006; Naish-Guzman & Holdcn, 2007). 

2. Copulas and Vines 

When the components of a c?-dimensional random vec- 
tor X = (a;i, . . . ^Xd)^ are independent, their density 
function p(x) can be factorized as 



(1) 



The previous equality does not hold when xi,...,Xd 
are not independent. Nevertheless, the differences can 
be corrected by multiplying the right hand side of (1) 
by a specific function that fully describes any possi- 
ble form of dependence between the random variables 
xi, . . . ,Xd- This function is called the copula of p(x) 
(Nelsen, 2006), and satisfies: 



p(x) = 



Wp{x 



c{P{xi),...,P(xd)), 



(2) 



copula 



where P{xi) is the marginal cumulative distribution 
function (cdf) of the random variable Xi. The copula 
c is the joint multivariate density of P{xi)^ . . . , P{xd) 
and it has uniform marginal distributions, since 



P{x) ~ U[Q, 1] for any random variable x (Casella & 
Bergcr, 2001). This non-linear transformation from x 
to P{x) is known as the Probability Integral Transform 
(PIT). The copula is the density of x after eliminat- 
ing all the marginal information by applying the PIT 
to each individual component of x. Therefore, c de- 
scribes any dependence patterns which do not depend 
on the marginal distributions. If every P{xi) is contin- 
uous, then c is unique for any p(x) (Sklar, 1959). How- 
ever, infinitely many multivariate distributions share 
the same underlying copula (Figure 1). 

The main advantage of copulas is that they separate 
the learning of univariate marginal distributions from 
the learning of the multivariate dependence struc- 
ture that describes how they are coupled (Joe, 2005). 
Learning the marginals is easy and can be done using 
standard univariate methods. However, learning the 
copula is more difficult and requires models that can 
represent a broad range of dependence patterns. For 
the two-dimensional large collection of para- 

metric copula models is available (Nclsen, 2006). Some 
examples are the Gaussian, Student, Clayton, Inde- 
pendent, Gumbel or Frank copulas. Each of these 
families describes a different dependence structure be- 
tween two random variables. An intuitive example is 
the copula that describes independence, that is, the 
independent copula: it has density constant and equal 
to one, as one can infer from equations (1) and (2). 
The Appendix contains more on the bivariate Gaus- 
sian Copula, which is used extensively used through- 
out this paper. 

Although there exist many parametric models for two- 
dimensional copulas, for more than two dimensions 
the number and expressiveness of families of paramet- 
ric copulas is more limited. A solution to this prob- 
lem is given by pair copula constructions, vine copulas 
or simply vines (Joe, 1996; Bedford & Cooke, 2002; 
Kurowicka & Cooke, 2006). 

2.1. Regular Vines 

Vine copulas are hierarchical graphical models that 
factorize a d-dimensional copula density into a product 
of d{d — l)/2 bivariate conditional copula densities. 
They offer great modeling flexibility, since each of the 
bivariate copulas in the factorization can belong to a 
different parametric family. Several types of vines have 
been proposed in the literature. Some examples are 
canonical vines (C- Vines), drawable vines (D-vines) 
or regular vines (R- Vines). In this paper we focus on 
regular vines, since they are a generalization of all the 
other types (Dissman ct al., 2012). 

An R-vine V specifies a factorization of a copula den- 
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sity c{ui, . . . ,Ud) into a product of bivariate condi- 
tional copulas. Such R-vine is constructed by forming 
a nested set of d — 1 undirected trees, in which each of 
their edges corresponds to a conditional bivariate cop- 
ula density. A particular nested set of trees identifies 
a particular valid factorization of c(ui, . . . , Ud)- These 
trees can be sequentially constructed as follows: 

1. Let Ti, . . . , Td-i be the trees in a R-Vine V, each 
of them with set of nodes Vi and set of edges Ei . 

2. Every edge e £ Ei has associated three sets of 
variable indexes C{e),D{e),N{e) C {l,...,d} 
called the conditioned, conditioning and con- 
straint sets of e, respectively. 

3. The first tree in the hierarchy has set of nodes 
Vi = {1, . . . , d} and set of edges Ei, which is ob- 
tained by inferring a spanning tree from a com- 
plete graph Gi over Vi. 



G2/T2 



4. For any edge e G Ei joining nodes j,k S 
C(e) = N{e) = {j, k} and D{e) = {0}. 



Vi, 



5. The i-th tree has node set Vi = and edge set 
Ei, for i ~ 2, . . . d — 1. Ei is obtained by inferring 
a spanning tree from a graph Gi] this graph has 
set of nodes Vi and edges e = (61,62) G Ei, such 
that 61, 62 G Ei^i share a common node in Vi-i. 

6. Edges 6 — (61,62) e Ei have conditioned, con- 
ditioning and constraint sets given by C{e) = 
iV(6i)A7V(62), D{e) = N{ei)nN{ei) and iV(6) = 
iV(ei) U A^(62), where AAB^{A\B)U{B\A). 

Each of the edges in the trees Ti,...,Td_i forming 
the vine V is a different factor in the factorization of 
c{ui, . . . , Ud), i.e. a different conditional bivariate cop- 
ula density. Since there are a total of d{d— l)/2 edges, 
V factorizes c(ui, . . . , Ud) as the product of d{d — l)/2 
factors. We now show how to obtain the form of each 
of these factors. For any edge e{j, k) € Tj with condi- 
tioned set C{e) = {j, k} and conditioning set D{e) we 



define c. 



]k\D{e 



to be the bivariate copula density for 



Uj and Mfc given the value of the conditioning variables 
{ui : i £ D{e)}, that is, 

Cjk\D(e) c(Pjp(e),Pi,|£,(e)|u^ : i G D{e)), (3) 

where Pj\D{e) '■— P{uj\ui : i € E){e)) is the conditional 
cdf of Uj given the value of the conditioning variables 
{ui : i G D{ey}. Then, the vine V formed by the 
hierarchy of trees Ti, . . . ,Td-i specifies the following 
factorization for the copula density: 



c(ui, ...,Ud) 



n n 

i=l e(j,k)GEi 



(4) 




C1234 = ei3|0 62310 C34I0 Ci2|3 Cl4|3 C24|13 




Figure 2. Example of the hierarchical construction of an 
R-vine factorization of a copula density of four variables 
c{ui,U2,U3,U4). The edges selected to form each tree are 
highlighted in bold. Conditioned and conditioning sets for 
each node and edge are shown as C(e)|_D(e). 



as shown by Kurowicka & Cooke (2006). 

Figure 2 exemplifies how to construct a regular vine 
that factorizes the copula density c(ui, 1x2, M3, W4) into 
the product of 6 bivariate conditional copula densities. 
The first tree Ti has node set Vi = {1,2,3,4}. The 
edge set Ei is obtained by selecting a spanning tree 
over Gi, the complete graph for the nodes in Vi. Our 
choice for Ei is highlighted in bold in the left-most 
plot in Figure 2. Edges in Ei = { 61, 62, 63} have con- 
ditioned and constraint sets C(6i) = iV(6i) = {1,3}, 
C(62) = ^(62) = {2,3}, C(63) = ^(63) = {3,4} and 
conditioning sets 1^(61) = £'(62) = 0(63) = {0}. The 
second tree in the hierarchy has node set V2 = Ei. In 
this case, we select a spanning tree over G2, a graph 
with node set V2 and edge set formed by pairs of edges 
Ci , 6j G El sharing some common node Vk G Vi . We 
select E2 — {64,65} with conditioned sets C(64) = 
7V(6i)AiV(62) = {1,2} and (7(65) = Af(62)AiV(63) = 
{1, 4}, conditioning sets D(64) = iV(6i) n A^(62) = {3} 
and D{e^) — N{e2) n N{e3) — {3}, and constraint 
sets i:»(64) = iV(6i) U N{e2) = {1,2,3} and 0(65) = 
N{ei) n A^(63) = {1,3,4}. Finally, we build a third 
graph G3 with node set V3 = E2 and only one edge 65 . 
This last edge is the only possible spanning tree and 
has node set V3 and edge set E^ = {cq}. The edge Cq 
has conditioned set C{eQ) = N{ei)AN{e5) = {2,4}, 
conditioning set D{e(i) — N{e4) n A^(65) = {1,3} and 
constraint set iV(66) = A^(64) U iV(65) = {1,2,3,4}. 
The resulting factorization of c(ui, U2, W3, M4) given by 
the tree hierarchy is shown at the bottom of Figure 2. 

There exist many factorizations of a copula density 
c(mi, . . . , Ud) in terms of bivariate copulas. Each fac- 
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torization is determined by the specific choices of the 
spanning trees Ti , . . . , in the algorithm described 
above. In practice, the trees are selected by assigning 
a weight to each edge e{j,k) (copula Cjk\D(e)) in the 
graphs Gi, . . . , Gd-i and then selecting the maximum 
spanning tree at each iteration. A common practice is 
to directly relate the weight of the edge e(j, k) to the 
amount of dependence described by the correspond- 
ing copula Cjk\D(e)- This amount of dependence can 
be measured as the absolute value of the empirical 
Kendall's r correlation coefficient between the sam- 
ples of Uj|D(e) and u^,|£)(-g). The maximum spanning 
tree can then be selected efficiently using Prim's algo- 
rithm (Prim, 1957; Dissman et al., 2012). 

On the first tree of a vine, only pairwise dependencies 
are described, and the corresponding copulas are not 
conditioned. The following trees describe dependen- 
cies between 3, 4, ... and d—1 variables by means of in- 
creasingly deeper conditioning, until completing a full 
description of the joint d— dimensional copula density. 
Since the cost of constructing the full tree hierarchy is 
quadratic in d, one may choose to prune the vine and 
construct only the first d' < {d— 1) trees, ignoring the 
remaining copula densities in the factorization. Since 
the independent copula has pdf constant and equal to 
one, this pruning assumes independence in the higher 
order interactions captured by the ignored copulas. 

2.2. Conditional Dependencies in Vines 

As shown in equations (3) and (4), vine distributions 
require to calculate marginal conditional cdfs and con- 
ditional bivariate copula densities. The number of 
variables to condition on increases as we move deeper 
in the vine hierarchy. In general, to obtain the fac- 
tors corresponding to the i-th tree, we have to condi- 
tion both copula densities and marginal cdfs to i — 1 
variables. The computation of the conditional cdfs ap- 
pearing at tree can be done using the copula func- 
tions from the previous tree Ti_i. In particular, the 
following recursive relationship holds 



dCjk\D{e)\{k} 



dPi 



k\D{e)\{k} 



(5) 



where Cjk\D(e)\{k} '■= G{Pj\D(e)\{k}-,Pk\D(e)\k\ui : i e 
D{e) \ {fc}) is the cdf of the conditional copula den- 
sity Cjfe|£)(e)\{fe} and D{e) \ {k} denotes the condition- 
ing set D{e) with the element k removed (Joe, 1996). 
This derivative has well-known, closed-forms for each 
parametric copula (refer to the Appendix for the Gaus- 
sian copula case). However, we still have to compute 
the conditional bivariate copula densities. A solution 
commonly found in the literature is to assume that the 
copulas Cjk\D(e) in (4) are independent of their condi- 



tioning variables. This is known as the simplifying as- 
sumption for vine copulas (Hoback ct al., 2010). The 
main advantage is that we can construct vine mod- 
els using standard unconditional parametric copulas. 
The disadvantage is that we may fail to capture some 
of the dependencies present in the data. As an alterna- 
tive to the simplifying assumption, we now present a 
general technique to construct conditional parametric 
bivariate copulas. 

3. Proposed Approach to Estimate 
Conditional Bivariate Copulas 

In this section we address the estimation of the condi- 
tional copula of two random variables X and Y given 
a vector of conditioning variables Z = (Zi, . . . , Zii)^ e 
M^. Let Px|z and Py|z be the conditional cdfs of X 
and Y given Z. Patton (2006) shows that the condi- 
tional copula of X and Y given Z is the conditional 
distribution of the random variables U = Px\7,{X\Ti) 
and V — Py|z(y|Z) given Z. We assume a paramet- 
ric bivariate copula for the joint distribution of U and 
V . This type of copulas can often be fully specified in 
terms of Kendall's t rank correlation coefficient (Joe, 
1997). Table 1 shows, for some widely- used copula 
families, the domain of their parameter 6 and the cor- 
responding bijective expressions for as a function of 
Kendall's t. To capture the dependence of the copula 
on Z we introduce a latent function g : M.'^ — >■ [—1,1] 
such that r — ^(Z). The task of interest is then to 
estimate g given observations oi X^Y and Z. 

When Px|z and Py|z are known, we can transform any 
sample of A" , y and Z into a corresponding sample of 
[/, V and Z. Let V ^ {Du,v = {(u», w^lf^i, I?z = 
{zi}"^]^} be such a sample, where ut and Vi and 
are paired. To guarantee that g{x) € [—1,1], we as- 
sume w.l.o.g. that g{x) — 2$(/(x)) — 1, where ^{x) is 
the standard Gaussian cdf and / : K'^ — >■ K is a non- 
linear function that uniquely specifies g. We can infer 
g by placing a Gaussian process prior on / and then 
computing the posterior for / given T) (Rasmussen & 
Williams, 2006). For this, let f be the n-dimensional 
vector such that f = (/(zi), . . . , /(z.^))"^. The prior 
for f given 2?z is 



p(f|2?z)=A^(f|m,K), 



(6) 



where m is a n-dimensional mean vector and K is an 
n X n covariance matrix generated by the covariance 
function or kernel 

k,j EE Cov[/(z,),/(zj)] 

= crexp {-{z, - Zj)'^diag(A)(z,j - z^)} + ctq , (7) 
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where A is a vector of lengthscales and cr, (Jq are am- 
plitude and noise parameters. Then, the posterior dis- 
tribution for f given Vuy and 2?z is 



pi{\Vu.y,Vz) = 



p{Vuy\i)p{f\V2) 

p{Vuy\Vz) 



(8) 



where p{Vuy\t) = Utl<^^^^^h = 2$(/,) - 1), 
p{T^uy\T^z) is a normalization constant and c(-,-|t) 
is the density of a parametric bivariate copula speci- 
fied in terms of Kendall's r. Given a particular value 
of Z such as z*, we can make predictions about the 
conditional distribution of U and V given z* using 

p{u\v*\z*)^ J c{u\v*\T^2^n^l) 

p{r\i, z\ V^)p{i\Vuy, Vz) di df* , (9) 

p(r|f,z^P,) = AA(/*|kTK-if,fc-kTK-ik), k = 
(Cov(/(z*),/(zi)),...,Cov(/(z*),/(z„)))T and k = 
Cov(/(z*), /(z*)). Unfortunately, (8) and (9) cannot 
be computed analytically, so we decide to approximate 
them using Expectation Propagation (EP) (Minka, 
2001). This method approximates each of the n factors 
in p(T>uy\t) with an unnormalized Gaussian distribu- 
tion whose mean and variance parameters are updated 
iteratively by matching sufficient statistics. See Ras- 
mussen & Williams (2006) for further details. To refine 
each of these univariate Gaussians, we have to compute 
three unidimensional integrals using quadrature meth- 
ods. For prediction at z*, we sample /(z*) from the 
Gaussian approximation found by EP and then aver- 
age over copula models with r = 2<i>(/(z*)) — 1. The 
resulting conditional copula model is semi-parametric: 
The dependence between U and V given Z is paramet- 
ric but the effect of Z on the copula is non-parametric. 

3.1. Sparse GPs to Speed up Computations 

The total cost of EP is 0{n^), since it is dominated 
by the computation of the Cholesky decomposition of 
an n X n matrix. To reduce this cost, we use the 
FITC approximation for Gaussian Processes (Snelson 
& Ghahramani, 2006; Naish-Guzman & Holden, 2007). 
Under this approximation, the nxn covariance matrix 
K is approximated by K' = Q -f diag(K — Q), where 
Q = K„„„K,;;^;^„^Kj^„^^, K„o„o is the uq x uq covariance 
matrix generated by evaluating (7) at all combinations 
of some uq <^ n training points or pseudo-inputs, and 
K„„(, is the n x uq matrix with the covariances between 
all possible combinations of original training points 
and pseudo-inputs. These approximations allow us to 
run the EP method with cost 0(7ing). The kernel 
hyper-parameters A, a and do ^^nd the pseudo-inputs 
are optimized by maximizing the EP approximation of 
the model evidence (Rasmussen & Williams, 2006). 



Table 1. Some copula families, with their parameter do- 
mains and expressions as Kendall's r correlations. For 
some copulas like Frank or Joe, numerical approximations 
are used to compute 0{t). 



Family 


Parameter 


9(r) = 


Gaussian 


[-1,1] 




Student 


[-1,1] 


Clayton 


e G (o,oo) 


2r/(l-r) 


Gumbel 


e e [i,oo) 


1/(1 -r) 


Frank 


9 e (0,oo) 


No closed form 


Joe 


9 e (i,oo) 



Learning a vine with our method scales linearly with 
the number of samples n, but quadratically with the 
number of pseudo-inputs Uq and the number of vari- 
ables d: thus, its complexity is O(d^rtgn). By contrast, 
learning a simplified vine has complexity 0{d^). 

3.2. Related Work 

Acar, Genest and Neslehova (2012) first addressed the 
lack of conditional dependencies in the parametric cop- 
ulas that form a vine model. They use a method sim- 
ilar in spirit to the one described above, and model t 
as a non-linear function of a single conditioning vari- 
able Z (Acar et al., 2011). However, their method 
cannot handle multivariate conditional dependences 
and consequently, they only show results for trivari- 
ate vines. They use the Maximum Local Likelihood 
(MLL) method to infer a non-linear relationship be- 
tween T and Z. Acar et al. approximate linearly / 
at any point z where / needs to be evaluated. Then, 
they adjust the coefficients of the resulting linear form 
using the available observations in the neighborhood 
of z. In particular, given a sample V — {Vuy = 
{{ui,v,)}f^j^,Vz = {zjf^i}, they estimate f{z) by 
solving the optimization problem 



N 



{KflX.i) = argmax<^ Vfc,j(^ 



Zi) 



. i=0 



logc(u,, v,\t = 2$(foo + bi{z ~ Zi)) - 1)} , 

(10) 

where the neighborhood of z is determined by the 
Epanechnikov kernel kh{x) — max(0, 1 — (|^)^) with 
bandwidth h. An estimate of f{z) is then obtained as 
the intercept of the linear approximation at z, that is, 
f{z) w 6Jg. Acar et al. adjust h by running a leave- 
one-out cross validation search on the training data. 
Some disadvantages of the MLL method are: (i) it can 
only condition on a single scalar variable, (ii) we have 
to solve the optimization problem (10) for each predic- 
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tion that we want to make and more importantly (iii) 
since it is a local-based method (similarly as nearest 
neighbours) it can lead to poor predictive performance 
when the available data is sparsely distributed. 

4. Experiments 

We evaluate the performance of the proposed method 
for the estimation of vine copula densities with full 
conditional dependencies. Because GPs are an impor- 
tant part in this method we call it GPVINE. We com- 
pare with two benchmark methods: (i) a vine model 
based on the simplifying assumption (SVINE), which 
ignores any conditional dependencies in the bivari- 
ate copulas, and (ii) a vine model based on the MLL 
method of Acar ct al. (2012) (MLLVINE). MLLVINE 
can only handle conditional dependencies with respect 
on a single scalar variable. Therefore, we can only eval- 
uate its performance in the construction of vine models 
with two trees, since additional levels would require to 
account for multivariate conditional dependencies. 

In all the experiments, we use 20 pseudo-inputs in the 
generalized FITC approximation. The Gaussian pro- 
cesses use a kernel function given by (7), whose hyper- 
parameters and pseudo-inputs are tuned by maximiz- 
ing the EP estimate of the marginal likelihood. The 
mean of the GP prior (6) is chosen to be constant 
and equal to $~^((tml£; + l)/2), where tmle is the 
maximum likelihood estimate of r for an uncondi- 
tional Gaussian copula given the training data. In 
MLLVINE, the bandwidth of the Epanechnikov kernel 
is selected by running a leave-one-out cross validation 
search using a 30-dimensional log-spaced grid ranging 
from 0.05 to 10. To simplify the experiments, we fo- 
cus on regular vines generated using bivariate Gaus- 
sian copulas (see Appendix A) as building blocks. The 
extension of the proposed approach to select among 
different parametric families of bivariate copulas is 
straightforward: the best family for a given pair of 
variables can be selected by Bayesian model selection, 
using the evidence approximation given by EP. All the 
data are preprocessed to have uniform marginal distri- 
butions: this is done by mapping each marginal obser- 
vation to its empirical cumulative probability. 

4.1. Synthetic Data 

We first perform a series of experiments with synthetic 
three-dimensional data. In particular, we sample the 
scalar variables X, Y and Z according to the follow- 
ing generative process. First, Z is sampled uniformly 
from the interval [—6, 6] and second, X and Y are 
sampled given Z from a bivariate Gaussian distribu- 
tion with zero mean and covariance matrix given by 



Table 2. Average test log-likelihood and standard devi- 
ations for SVINE and GPVINE on real-world datasets 
(higher is better). Asterisks denote results that are not 
statistically significant to a paired Wilcoxon test with 
p-value =10"^. 
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cloud glass housing: .jura shuttle weather stocks 
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Number of trees 

Figure 3. Average test log-likelihood and standard deviations for each dataset as the number of trees forming the vines 
increases, for GPVINE (red triangles) and SVINE (black dots) (higher is better). 
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Figure 4. In green, the true function g that maps 113 to r. 
The approximations learned by GPVINE and MLLVINE 
are shown in red and blue, respectively. For GPVINE, the 
uncertainty of the prediction (±1 standard deviation) is 
drawn as a red shaded area. Small dots are used to mark 
the available observations of U3 for training. 



Var(X) = Var(y) = 1 and Cov{X, Y\Z) = 3/4sin(Z). 
We sample a total of 1000 data points and then choose 
50 subsamples of size 100 to infer a vine model for the 
data using SVINE, MLLVINE and GPVINE. Average 
test log-likelihoods on the remaining data points are 
shown in Table 3. GPVINE obtains the best results. 

Figure 4 displays the true value of the function 
g that maps U3 to r in the conditional copula 
c(P(ui 1^3), P(mi|m3)|m3, r), where ui, U3 and U3 are 
the empirical cumulative probability levels of the sam- 
ples generated for X, Y and Z, respectively. We also 
show the approximations of g generated by GPVINE 
and MLLVINE. In this case, GPVINE is much better 
than MLLVINE at approximating the true g. 

4.2. Real-world Datasets 

We evaluate the performance of SVINE, MLLVINE 
and GPVINE on several real-world datasets. For each 
dataset, we generate 50 random partitions of the data 
into training and test sets, each containing half of the 
available data. The different methods are run on each 
training set and their log-likelihood is then evaluated 
on the corresponding test set (higher is better). The 
analyzed datasets are described in Section 4.2.1. Table 
2 and Figure 3 show the test log-likelihood for SVINE 



and GPVINE, when using up to 1, . . . , (d — 1) trees in 
the vine, where d is the number of variables in the data. 
In general, taking into account possible dependencies 
in the conditional bivariate copulas leads to superior 
predictive performance. Also, we often find that the 
gains obtained get larger as we increase the number of 
trees in the vines. However, in a few of the datasets the 
simplifying assumption seems valid (Stocks and Jura 
datasets). Table 3 shows results for all methods (in- 
cluding MLLVINE) when only two trees are used in 
the vines. In these experiments, MLLVINE is most of 
the times outperformed by GPVINE. To better mea- 
sure the percent improvement experienced when us- 
ing GPVINE, one can subtract the achieved likelihood 
when using only the first tree of the vine from the 
all results. We also show how GPVINE can be used 
to discover scientifically interesting features through 
learning spatially varying correlations (Figure 5). 

4.2.1. Description of the Datasets 

Mineral Concentrations The jura dataset con- 
tains the concentration measurements of 7 chemical 
elements (Cd, Co, Cr, Cu, Ni, Pb, Cn) in 359 loca- 
tions of the Swiss Jura Mountains (Goovaerts, 1997). 
The uranium dataset contains log-concentrations of 7 
chemical elements (U, Li, Co, K, Cs, Sc, Ti) in a total 
of 655 water samples collected near Grand Junction, 
CO (Cook & Johnson, 1986). Acar et al. (2012) use 
the measurements for Co, Ti and Sc to evaluate the 
performance of MLLVINE. We replicated this task for 
the three analyzed methods (Table 3). 

Barcelona Weather OpenWeatherMap (Extreme 
Electronics Ltd., 2012) provides access to meteorologi- 
cal stations around the world. We downloaded data for 
the 300 weather stations nearest to Barcelona, Spain 
(41.3857N, 2.1699E) on 11/19/2012 at 8pm {weather 
dataset). Each station returns values for longitude, lat- 
itude, distance to Barcelona, temperature, atmospheric 
pressure, humidity, wind speed, wind direction and 
cloud cover percentage. Figure 5 shows how the poste- 
rior mean of r for the copula linking the variables at- 
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Table 3. Average test log-likelihood and standard devia- 
tions for all methods and datasets when limited to 2 trees 
in the vine (higher is better). 



Data 


SVINE 


MLLVINE 


GPVINE 


Synthetic 


-0.005 ±0.012 


0.101 ±0.162 


0.298 ±0.031 


Uranium 


0.006 ±0.006 


0.016 ±0.026 


0.022 ±0.012 


Cloud 


8.899 ±0.334 


9.013 ±0.600 


9.335 ±0.348 


Glass 


1.206 ±0.259 


0.460 ±1.996 


1.264 ± 0.303 


Housing 


3.975 ± 0.342 


4.246 ±0.480 


4.487 ± 0.386 


Jura 


2.134 ±0.164 


2.125 ±0.177 


2.151 ± 0.173 


Shuttle 


2.552 ± 0.273 


2.256 ±0.612 


3.645 ± 0.427 


Weather 


0.789 ±0.159 


0.771 ±0.890 


1.312 ±0.227 


Stocks 


2.802 ±0.141 


2.739 ±0.155 


2.785 ±0.146 



mospheric pressure and cloud cover percentage varies 
when conditioned on latitude and longitude. 

World Stock Indices We apply the probability in- 
tegral transform to the residuals of an ARM A (1,1 )- 
GARCH(1,1) model with Student t innovations. The 
residuals are obtained after fitting this model to the 
daily log-returns of the major world stock indices in 
2009 and 2010 {stocks dataset, 396 points in total) 
(Dissman et al., 2012). The considered indices are the 
US American S&P 500, the Japanese Nikkei 225, the 
Chinese SSE Composite Index, the German DAX, the 
French CAC 40 and the British FTSE 100 Index. 

UCI Datasets We also include experimental results 
for the Glass, Housing, Cloud and Shuttle datasets 



from the UCI Dataset Repository (Frank & Asuncion, 
2010). 

5. Conclusion 

Vine copulas are increasingly popular models for mul- 
tivariate data. They specify a factorization of any 
high-dimensional copula density into a product of con- 
ditional bivariate copulas. However, some of the con- 
ditional dependencies in these bivariate copulas are 
usually ignored when constructing the vine. This 
can produce overly simplistic estimates when dealing 
with real-world data. To avoid this, we presented a 
method for the estimation of fully conditional vines 
using Gaussian processes (GPVINE). A series of ex- 
periments with synthetic and real-world data show 
that, often, GPVINE obtains better predictive perfor- 
mance than a baseline method that ignores conditional 
dependencies. Additionally, GPVINE performs favor- 
ably with respect to state-of-the-art alternatives based 
on maximum local- likelihood methods (MLLVINE). 
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Figure 5. Kendall's r correlation between atmospheric 
pressure and cloud percentage cover (color scale) when con- 
ditioned to longitude and latitude. The blue region in the 
plot corresponds to the Pyrenees mountains. 



A. The Bivariate Gaussian Copula 

The bivariate Gaussian copula with correlation param- 
eter 6 represents the dependence structure found in a 
bivariate Gaussian distribution of two random vari- 
ables with correlation 6. The Gaussian copula has cdf 

Ciu,v\0) = $2($"'(w),$-i(i;))|0), (11) 

where <I'2(-, •|6') is the cdf of a bivariate Gaussian with 
marginal variances equal to one and correlation 9, and 
is the quantile function of the standard Gaussian 
distribution. The corresponding pdf is 



c{u, v\ 



(12) 



(/)($-i(u))9!'(*"H^')) 

where (j>2 is the derivative (pdf) of $2 ■ The conditional 
cdfs are given by 

dC{u, V 
du 
dC{u, V 



= $ 



dv 



= $ 



(13) 
(14) 



where $ is the standard Gaussian cdf. 
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